Instance Selection in Semi-supervised Learning
نویسندگان
چکیده
Semi-supervised learning methods utilize abundant unlabeled data to help to learn a better classifier when the number of labeled instances is very small. A common method is to select and label unlabeled instances that the current classifier has high classification confidence to enlarge the labeled training set and then to update the classifier, which is widely used in two paradigms of semi-supervised learning: self-training and co-training. However, the original labeled instances are more reliable than the self-labeled instances that are labeled by the classifier. If unlabeled instances are assigned wrong labels and then used to update the classifier, classification accuracy will be jeopardized. In this paper, we present a new instance selection method based on the original labeled data (ISBOLD). ISBOLD considers not only the prediction confidence of the current classifier on unlabeled data but also its performance on the original labeled data only. In each iteration, ISBOLD uses the change of accuracy of the newly learned classifier on the original labeled data as a criterion to decide whether the selected most confident unlabeled instances will be accepted to the next iteration or not. We conducted experiments in self-training and co-training scenarios when using Naive Bayes as the base classifier. Experimental results on 26 UCI datasets show that, ISBOLD can significantly improve accuracy and AUC of selftraining and co-training.
منابع مشابه
Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملInstance Selection Method for Improving Graph-Based Semi-supervised Learning
Graph-based semi-supervised learning (GSSL) is one of the most important semi-supervised learning (SSL) paradigms. Though GSSL methods are helpful in many situations, they may hurt performance when using unlabeled data. In this paper, we propose a new GSSL method GsslIs based on instance selection in order to reduce the chances of performance degeneration. Our basic idea is that given a set of ...
متن کاملRegion Selection based on Evidence Confidence for Localized Content-Based Image Retrieval
Over the past decade, multiple-instance learning (MIL) has been successfully utilized to model the localized content-based image retrieval (CBIR) problem, in which a bag corresponds to an image and an instance corresponds to a region in the image. However, existing feature representation schemes are not effective enough to describe the bags in MIL, which hinders the adaptation of sophisticated ...
متن کاملA diffusion approach for interactive image retrieval
We study in this paper the problem of using multiple-instance semi-supervised learning to solve image Relevance feedback problem. Many multiple-instance learning algorithms have been proposed to tackle this problem; most of them only have a global representation of images. In this paper, we present a semi-supervised version of multiple instance learning. By taking into account both the multiple...
متن کاملInstance-level Semisupervised Multiple Instance Learning
Multiple instance learning (MIL) is a branch of machine learning that attempts to learn information from bags of instances. Many real-world applications such as localized content-based image retrieval and text categorization can be viewed as MIL problems. In this paper, we propose a new graph-based semi-supervised learning approach for multiple instance learning. By defining an instance-level g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011